Informative term selection for automatic query expansion

نویسندگان

  • Claudio Carpineto
  • Renato De Mori
  • Giovanni Romano
چکیده

Techniques for query expansion from top retrieved documents have been recently used by many groups at TREC, often on a purely empirical ground. In this paper we present a novel method for ranking and weighting expansion terms. The method is based on the concept of relative entropy, or Kullback-Lieber distance, developed in Information Theory, from which we derive a computationally simple and theoretically justified formula to assign scores to candidate expansion terms. This method has been incorporated into a comprehensive prototype ranking system, tested in the ad hoc track of TREC-7. The system’s overall performance was comparable to median performance of TREC-7 participants, wich is quite good considering that we are new to TREC and that we used unsophisticated indexing and weighting techniques. More focused experiments showed that the use of an information-theoretic component for query expansion significantly improved mean retrieval effectiveness over unexpanded query, yielding performance gains as high as 14% (for non interpolated average precision), while a per-query analysis suggested that queries that are neither too difficult nor too easy can be more easily improved upon.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparison of Global Term Expansion Methods for Text Retrieval

This paper describes our work at the fifth NTCIR workshop on the subtasks of single language information retrieval (SLIR). Several automatic global query expansion strategies were explored based on a machine-derive thesaurus. These term selection strategies were compared with manual selection and local expansion. Experiments show that all the global expansion strategies perform worse than the s...

متن کامل

Query Expansion Using a Collection Dependent Probabilistic Latent Semantic Thesaurus

Many queries on collections of text documents are too short to produce informative results. Automatic query expansion is a method of adding terms to the query without interaction from the user in order to obtain more refined results. In this investigation, we examine our novel automatic query expansion method using the probabilistic latent semantic thesaurus, which is based on probabilistic lat...

متن کامل

Query Expansion Using Term Distribution and Term Association

Good term selection is an important issue for an automatic query expansion (AQE) technique. AQE techniques that select expansion terms from the target corpus usually do so in one of two ways. Distribution based term selection compares the distribution of a term in the (pseudo) relevant documents with that in the whole corpus / random distribution. Two well-known distribution-based methods are b...

متن کامل

Evolving Term-Selection Schemes for Pseudo-Relevance Feedback in Information Retrieval

Automatic query expansion in Information Retrieval aims to improve retrieval performance by overcoming the problem of term mismatch between a query and its relevant documents. Pseudorelevance (blind) feedback techniques have been shown to be of benefit on large TREC collections in recent years. This technique analyses terms in the top few documents deemed relevant by the system, reformulates th...

متن کامل

Improving Document Retrieval by Automatic Query Expansion Using Collaborative Learning of Term-Based Concepts

Query expansion methods have been studied for a long time – with debatable success in many instances. In this paper, a new approach is presented based on using term concepts learned by other queries. Two important issues with query expansion are addressed: the selection and the weighing of additional search terms. In contrast to other methods, the regarded query is expanded by adding those term...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999